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ABSTRACT 


This  thesis  proposes  implementations  of  the  sort  an d 
join  in  the  Multiple-Backend  Database  System.  The  idea  of 
implementing  these  operations  is  to  provide  better  support 
for  relational  databases  and  relational  language  interfaces. 
The  key  issue  analyzed  is  the  distribution  of  functionality 
of  the  operation  across  the  multiple  minicomputer s  of  the 
MDBS  architecture.  The  join  analysis  also  examines 
alternative  join  algorithms. 
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A  current  research  effort  at  the  Naval  Dostgraduate 
School  is  the  investigation  of  the  idea  of  a  database 
kernel.  It  is  proposed  that  the  attr ibute-based  data  model 
and  the  attribute-based  data  language  (A3DL)  is  used  as  .a 
kernel  to  support  relational,  hier archical ,  and  network 
databases.  A  prototype  software  database  svstem,  the  Muiti- 
3acke.nd  Database  System  (MDBS),  which  uses  the 
attribute-based  data  model,  is  the  target  kernel  system. 

The  operations  of  the  attribute-based  data  language  are 
RETRIEVE,  INSERT,  DELETE,  and  'JPDATE ,  the  four  p-imary 
operations  of  any  database  management.  One  proposal  is  that 
additional  operations  be  implemented  in  MDBS  to  provide  a 
more  complete  database  kernel.  In  this  thesis,  we 
investigate  the  addition  of  a  sorting  capability  and  the 
relational  join  operation. 

MDBS  is  a  multiple-processor  system.  The  interesting 
issue,  when  considering  the  implementation  of  the  sort  and 
join  operations,  is  the  distribution  of  functionality  among 
the  multiple  processors.  In  this  thesis,  we  propose  and 
analyze  various  d istr ibut ions  of  the  functionality . 


In  analyzing  the  issues  of  alternative  distributions  of 
the  functions,  our  approach  will  be  to  use  the  existing 
functional  units  in  MDBS.  We  propose  alternatives,  ari 
evaluate  then  according  to  the  design  goals  of  MDBS.  Dur 
proposals  require  minimal  interface  changes  among  the 
functional  units. 

We  will  approach  the  issues  in  the  following  manner.  We 
will  make  a  number  of  proposals.  We  will  analyze  the  time 
complexity  of  the  proposals.  Then,  based  on  the  MDBS  design 
goals  and  the  complexity  analyses,  we  will  make  specific 
recommendat ions . 

A.  THE  ORGANIZATION  OF  THE  THESIS 

In  the  rest  of  the  thesis,  we  examine  the  distribution 
of  functionality  for  the  sort  and  join  operations. 
Specifically,  Chapters  II  through  V  cover  the  sort  function . 
Then,  Chapters  VI  through  VIII  cover  the  join . 

In  Chapter  II,  we  give  a  brief  review  of  the  MDBS 
hardware  and  sotware  architectures.  In  Chapter  III,  we 
present  the  general  assumptions  and  notation  used  in 
analyzing  the  alternatives.  In  Chapter  IV,  we  consider  the 
distribution  of  functionality  among  the  controller  and  the 
backends.  In  Chapter  V,  we  consider  specific  algorithms  for 
introducing  the  sorting  function.  We  also  examine  the  case 
where  a  particular  sorting  task  does  not  fit  the  MDBS 
archi tecture .  We  discuss  how  the  sort  function  might 
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incorporate  feartures  of  the  MDBS  softwar 
well. 

Chapter  VI  introduces  the  join.  In 
examine  the  alternative  distributions  of 
among  the  controller  and  the  backends.  In 
specific  join  algorithm,  the  sort-match  j 
examined  in  the  context  of  MDBS,  -inally,  i 
summarize  our  conclusions  and  discuss  the 
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Chapter 
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n  Chapte 
c  o  n  t  r  i  b 


the  thesis 


II.  A  REVIEW 


■j  r 


■_  THE  MD  3S  HARDWARE  AMD 
SOFTWARE  ARCHITECTURES 

MDBS  is  a  multiple  minicomputer  system  that  uses  off- 
I  the- she if  hardware  and  special-purpose  software  in  an  inno¬ 

vative  configuration  to  support  high-performance  database 
operations  and  large-capacity  databases.  An  overview  of  the 
|  MDBS  hardware  organization  is  shown  in  Figure  0.1.  The  back¬ 

ends  and  the  controller,  which  are  gener al-purpose  minicom¬ 
puters,  are  connected  by  a  broadcast  bus.  The  controller* 
I  will  broadcast  each  request  to  all  backends  at  the  same 

time.  The  backends  process  the  request,  and  send  the  results 
to  the  controller  via  the  broadcast  bus.  Intercommunication 
between  the  backends  is  also  via  the  broadcast  bus.  Every 
backend  has  its  own  dedicated  disk  drives.  Reader  should 
refer  to  [Ref.  1,  2,  3]  for  more  detail. 

A.  DESIGN  GOALS  FOR  MDBS 

The  major  problem  for  conventional  database  systems  is 
their  inability  to  achieve  high  performance  as  the  database 
grows  and  the  rate  of  requests  increases.  In  order  to  over¬ 
come  this  problem,  a  high-performance  multi-backend  database 
system  should  have  the  following  properties. 


(1)  The  throughput  impro venent  is  proportional  to  the 
number  of  backends.  That  is,  if  the  number  of  backends  is 
doubled,  it  should  be  possible  to  nearly  double  the  size 
of  the  database  without  affecting  the  throughput. 

(2)  The  response  time  is  inversely  proportional  to  the 
number  of  backends.  It  should  be  possible  to  nearly  halve 
the  average  response  time  by  doubling  the  number  of 
backends . 

(3)  The  system  is  extensible  for  capacity  growth  and/or 
performance  improvement.  By  extensibility,  we  mean  that  an 
upgrade  of  the  system  can  be  made  with  no  modification  to 
the  existing  hardware  and  software,  and  no  major 
disruption  of  the  system  activity. 

To  meet  the  MDBS  design  goals,  the  controller  is 
implemented  with  the  following  goals.  The  amount  of  the  work 
that  the  controller  should  perform  must  be  minimized  in 
order  to  avoid  controller  bottleneck  problems. 
Communication  between  the  controller  and  the  backends  must 
also  be  minimized  in  order  to  avoid  bus  contention.  <\s  a 
consequence  of  the  controller  implementation  goals,  the 
backends  should  do  most  of  the  work.  Further,  the 
communication  among  the  backends  must'  be  minimized. 
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3.  THE  MDBS  SOFTWARE 


ARCH 


R  E 


MD3S  is  designed  to  provide 


database  g  r 
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performance  enhancement  by  the  addition  of  identical 
backends  and  their  disks.  The  software  architecture  does  not 
require  the  development  of  new  software  when  a  backend  is 
added.  In  other  words,  the  existing  software  supports  many 
backends  as  well  as  a  few  backends.  The  software 
architecture  allows  replication  of  the  existing  software  for 
the  new  backends  added  for  expansion.  *!o  new  software  is 
developed.  Reconfiguration  is  simple,  and  does  not  require 
extensive  system  regeneration .  The  software  architecture  of 
MD9S  is  shown  in  Figure  2.2.  For  more  detail,  refer  to  [Ref. 
1,  2,  3J. 


The  software  architecture  also  takes  full  advantage  of 
the  parallelism  in  the  hardware  architecture.  The  software 
of  the  backends  supports  parallel  processing  of  the 
database.  There  are  three  primary  features  which  support 
this  parallelism.  The  first  is  the  method  by  which  the 
database  is  distributed  over  the  disk  drives  of  the 
backends . 

The  data  model  chosen  for  the  system  is  the  attribute- 
based  data  model  [Ref.  1].  In  MDBS  the  database  consists  of 
files  of  records.  Each  record  is  a  collection  of  keywords  , 
optionally  followed  by  a  record  body.  A  keyword  is  made  up 
of  an  attribute-value  pair.  A  record  body  is  string  of 
characters  not  used  by  MDBS  for  search  purposes.  In 


particular,  the  first  attribute-value  pair  of  each  record  of 
a  file  consists  of  the  attribute  FILE  and  the  file  name  as 
its  value.  For  performance  reasons,  records  are  logically 
grouped  into  clusters  based  on  the  attribute  values  and 
attribute  value  ranges  in  the  records.  These  values  and 
value  ranges  are  called  descr iptors .  At  database  creation 
time,  the  database  creator  specifies  a  number  of 
descriptors.  These  descriptors  are  called  as  clustering 
descriptors  that  are  used  for  forming  clusters  of  records. 
An  attribute  that  appears  in  a  descriptor  is  called  a 
directory  attribute .  -or  the  purposes  of  clustering,  only 
those  keywords  of  the  records  which  contain  directory 
attributes  are  considered.  Such  keywords  of  the  record  are 
termed  directory  keywords . 

This  concept  of  clusters  contributes  to  parallel 
processing  in  the  following  manner.  The  distribution  of 
data  across  the  backends  is  based  on  the  concept  of 
clusters.  The  records  of  a  cluster  are  distributed,  accross 
the  backends  according  to  the  distribution  algorithm 
proposed  in  [Ref.  1].  Therefore,  each  backend  has  a  part  of 
the  cluster.  Thus,  each  backend  may  access  a  portion  of  the 
data  required  by  a  request.  All  backends  can  work  and  access 
their  portions  in  parallel. 

The  second  feature  of  the  software  architecture  which 
exploits  the  paralellism  is  the  way  in  which  directory  data 
is  managed.  Every  backend  has  its  own  copy  of  the  clusters. 
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The  search  for  the  descriptors  related  to  3  request  can  thus 
be  shared  by  all  of  the  backends. 

The  third  feature  which  supports  paralellism  is  the 
method  used  for  scheduling  requests  and  controlling 

concurrent  access  to  the  database  and  the  directory  d  a  v  a  . 
£ach  backend  keeps  a  request  queue.  Requests  are  schedu.ed 
independently ,  as  resources  become  available.  Concurrency  ts 
maintained  separately  at  each  backend  with  a  Icccog 
algorithm.  Thus,  the  backends  work  independent! y  and 
parallel.  In  exploring  alternatives  for  the  sort  and  ;:i- 
operations,  we  will  preserve  this  idea  of  independent, 
parallel  processing  in  the  backends. 
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Ill . 


ALTERNATIVES  FOR  DISTRIB 


‘IMG  THE  SCRTi: 


When  considering  alternatives  for  distributing  the  sort¬ 
ing  function  among  the  processors  of  MDBS,  we  must  consider 
both  the  hardware  and  software  architectures.  The  hardware 
and  software  architectures ,  as  explained  in  Chapter  II,  are 
designed  for  distributing  the  functionality  of  the  database 
management  operations  across  the  backends.  We  must  select 
an  alternative  that  exploits  the  inherent  parallelism  of  the 
architecture.  The  architecture  of  ^D3S  dictates  minimal 
controller  function,  minimal  message  traffic,  and  identical 
software  for  the  backends.  The  alternatives  which  we  recom¬ 
mend  should  be  consistent  with  the  dictations  made  on  the 
existing  hardware  and  software  architectures . 

We  will  consider  the  complexity  of  the  sort  function  to 
include  only  the  overhead  incurred  by  adding  an  ordering 
specification  to  a  RETRIEVE  request,  the  time  required  to 
retrieve  records  is  not  considered.  We  will  develop  expres¬ 
sions  which  represent  the  CPU  activity,  expressions  which 
represent  the  I/O  activity,  and  expressions  which  represent 
the  communication  activity  on  the  bus,  i.e.,  the  computing 
complexity  ,  the  access  complexity  ,  and  the  communication 
complexity  ,  respectively. 

When  analyzing  complexity  for  functions  distributed 
accross  the  backends,  remember  that  the  backends  are  working 
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3  c  r  3  3  s 


parallel.  The  result  of  this  distribution  of  work  across  3 
backends  operating  in  parallel  is  that  the  1  inear  conolex- 
ity ,  the  sum  of  the  work  done  at  all  3  backends,  is  reduced 
to  an  effective  complexity,  the  work  required  at  the  or.e 
backend  which  does  the  most  work.  Assumption  5  is  that  the 
number  of  blocks  to  be  sorted  is  evenly  distributed  across 
the  backends.  Therefore,  since  each  backend  will  do  an 
equal  amount  of  work,  the  effective  complexity  is  equal  to 
the  complexity  at  any  one  backend. 


A.  ASSUMPTIONS 

In  each  case,  we  will  analyze  the  worst-case  complexity 
of  the  current  alternative.  In  order  to  simplify  the 
analysis,  we  make  the  following  assumptions. 

(1)  Internal  sorting  only  is  considered,  due  to  memory 
limitations.  The  backends  are  currently  15-bit 
minicomputers  with  a  fixed,  32  K-byte  address  space. 
Therefore,  memory  limitation  is  a  real  problem. 

(2)  All  records  in  a  block  are  to  be  sorted  (i.e., 
selection  of  records  is  performed  by  record  processing 
before  sorting) . 

(3)  Sorting  is  block-by-block  (i.e.,  a  block  of  records 


selected  by  the  record  processing  function  is  passed  to 
the  sorting  function,  where  they  are  sorted  and  stored  in 
the  secondary  storage  for  merging). 


'  4  )  '* e r g e  is  2 - w a y .  This  is  the  simplest  case.  W e  will 
consider  K-way  merge  in  Chapter  V. 


(5)  The  number  o 

f  bl 

ocks  to  be 

sorted 

is  evenly 

distributed  across 

the 

backends  ( i 

•  e  .  , 

i  r  3 r e 

.  blocks  to  be  sorted 

and 

B  backends, 

then 

each  backend 

sorts  ,J!/3  blocks). 

(5)  Some  sorting  algorithms  of  the  order  1  r*loe 
where  r  is  the  number  of  records,  will  be  used. 

(7)  Seconds  are  sorted  on  a  single  concatenated  kev  'i.e., 
only  a  single  execution). 

(3)  The  time  to  send  a  block  of  d3ta  across  the  broadcast 
bus  is  an  average  time,  which  will  be  represented  as  a 
constant . 

(9)  The  time  to  read  (or  write)  a  block  of  data  f-'orn  'or 
to)  the  disk  is  an  average  time,  which  will  be  represented 
as  a  constant,  and  is  the  same  for  the  controller  and  the 
backends . 

(10)  The  CPU  time  required  for  a  comparison  operation  is 
the  same  at  the  controller  and  at  the  backends. 

B.  NOTATION 

In  analyzing  the  time  complexity,  we  will  deal  with 
variables  which  represent  the  number  of  backends,  the  number 
of  records  to  be  sorted,  the  number  of  records  in  a  block, 
etc.  We  will  also  deal  with  certain  constants.  ror  example, 
according  to  assumption  8  above,  there  is  some  -  constant 
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which  represents  the  tine  required  to  send  a  ol o: k  of  data 
across  the  broadcast  bus.  For  uniformity,  we  define  the 
following  variables  and  constants  to  be  used  throughout  the 
analysis. 

(1)  3  =  the  number  of  backends  in  the  system. 

(2)  'I  =  total  number  of  records  to  be  sorted  for  a 
particular  request. 

(3)  r  =  the  number  of  records  in  a  block. 

(4)  b  =  the  number  of  blocks  to  be  sorted  at  the  b  .end. 
Mote  that  according  to  assumption  5  ,  b=M/(3*r).  To 
simplify  the  analysis,  we  will  assume  that  b  is  a  power 
of  2. 

(5)  log  :  stands  for  logarithm  to  the  base  2  unless 
otherwise  noted. 

C.  SYNTAX  FOR  THE  SORT  FUNCTION 

The  syntax  of  a  retrieve  request  in  MDBS  is  as  follows. 

RETRIEVE  Query  Target-list  [BY  attribute] [ WITH  pointer] 

That  is,  it  consists  of  five  parts.  The  first  part  is  the 
name  of  the  request.  The  second  part  is  a  query  which 
identifies  the  portion  of  the  database  to  be  retrieved.  The 
Target-list  is  a  list  of  elements.  Eack  element  is  either 
an  attribute  or  an  aggregate  operator  to  be  performed  on  an 


attribute.  The  fourth  part  of  the  request,  BY  clause,  is 

optional.  It  describes  the  whole  alternative  of  the 

attribute  such  that  BY  DEPT  means  every  department  in  the 
database.  The  fifth  part  of  the  request,  WITH  pointer,  is 

also  optional  which  specifies  whether  pointers  to  the 

retrieved  records  must  be  returned  to  the  use;-  or  user 

program  for  later  use  in  an  update  request  which  is  out  of 


our  concerns  for  sort  function. 

To  perform  the  sort  function,  we  first  need  to  retrieve 
the  records  that  are  relevant  to  the  user  request. 
Therefore,  modified  retrieve  request  can  be  used  as  a  syntax 
for  the  sort  function. 

With  modified  RETRIEVE  request,  we  may  consider  two 
different  alternatives  for  a  syntax  to  implement  the  join 
function  in  MDBS: 


1)  RETRIEVE  Query  Target-list  (ORDER_BY  ( Attr ibute_l i st_i ) ) 

2)  RETRIEVE  Query  ( 0RDER_3Y  (Attribute_list_1 ) , 

1  .  ( Attribute_list_2) ) 

In  both  alternatives,  the  first  two  parts  are  the  same 
as  in  regular  retrieve  request.  In  the  first  alternative, 
Target-list  clause  consists  of  the  attribute  names  with 
which  the  result  of  the  sort  function  is  given  to  the  user. 
0RDER_BY  clause  defines  the  function  to  be  performed  on  the 
retrieved  records.  Attribute  list  1  defines  the  list  of 
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attributes'  na-ies  with  which  the  retrieved  records 
sorted.  If  there  are  to  re  than  one  attribute  name  in 
Attr ibute_l i st_l ,  then  it  be  assumed  that  the  order  cf 
attributes  in  attr ibute_list_‘  gives  the  orcer 

implementation  of  consequi ti ve  sort  function  on 
retrieved  records.  Attr ibute_list  1  may  contain  ei 
Directory  Attribute(s)  or  non-directory  attributes  or  b 
The  important  ooint  is  that  each  attribute 
attr ibute_l ist_l  must  be  an  attribute  that  the  rec 
retrieved  from  database  obtain  it. 

In  the  second  alternative,  Atr ibute_l i st_1  includes 
attribute  names  with  which  the  record  are  sorted.  The  o 
of  performing  the  sort  function  on  the  records  is  again 
same  as  the  order  of  the  attributes  given 

attribute_list_1 . 

Attr ibute_l ist_2  includes  the  attribute  names  with  w 
the  result  of  the  sort  function  is  given  to  the  user 
other  words,  it  can  be  thought  as  a  target-list. 
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In  analyzing  the  distribution  of  function,  ve  will 
assume  that  the  sort  function  consists  of  two  phases:  the 
internal  sort  phase  and  the  merge  phase .  Because  of  main 
memory  limitations,  we  require  that  the  records  first  oe 
sorted  block-by-bloc-c.  The  sorted  clocks  are  stored  in 
temporary  storage  in  the  secondary  memory.  "This  is  done  by 
the  internal  sort  phase .  Sorted  blocks  will  then  be 
accessed  from  the  secondary  storage  and  merged,  "’’his  is  done 
by  the  merge  phase.  The  time  complexity  of  these  two 
processes  will  be  shown  separately.  At  the  end  of  the 
analysis  of  each  alternative,  the  total  time  complexity  will 
be  given. 

We  will  consider  three  alternatives  regarding 
distribution  of  function  through  the  system.  Since  MD3S 
consists  of  two  type  of  functional  units,  namely  the 
controller  and  the  backends,  the  possible  distributions  of 
functionality  are  the  following; 

A.  The  controller  performs  the  sort  function. 

3.  The  backends  perform  the  sort  function. 

C.  The  controller  and  the  backends  share  the  sort 
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function. 


We  will  analyze  these  three  alternatives  in  detail  in  the 
following  sections. 

A.  THE  CONTROLLER  PERFORMS  THE  SORT  FUNCTION 

In  this  alternative,  the  backends  perform  no  additional 
functions.  All  of  the  sorting  is  done  at  the  controller. 
The  backends  perform  the  selection,  projection,  an  o 
aggregation  operations  specified  in  the  RETRIEVE  reqiast, 
and  forward  the  result  records  to  the  controller.  The 
controller  accumulates  the  result  records  from  all  of  one 
backends,  and  sorts  them  in  the  order  specified  in  the 
RETRIEVE  request  before  forwarding  them  to  the  requester. 

There  is  no  change  in  the  functionality  of  the  backends. 
Therefore,  no  modification  of  the  software  of  the  backends 
is  required.  However,  at  least  two  processes  in  the 
controller  will  require  modification.  First,  the  request 
processing  process  must  be  augmented  to  recognize  the 
ordering  specification  of  the  request,  and  to  forward  the 
ordering  spec i f icat ion  to  the  post-processing  process.  The 
post-processing  process  must  be  augmented  to  recognize  that 
sorting  is  required,  and  to  accumulate  and  sort  result 
records  for  a  request  according  to  the  proper  ordering 
specification . 

First,  we  assume  that  all  blocks  for  a  query  have  been 
accumulated  and  stored  in  the  secondary  storage  of  the 
controller.  The  controller  will  have  (B*b)  blocks  to  sort. 
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The  internal  sort  p'nase  for  each  block  will  require 
0(r*logr)  time,  where  there  are  r  records  per  block.  The 
total  computing  complexity  for  the  internal  sort  phase  time 
is,  then, 


0(  3*b*r*log  r  ). 

2  *  3  *  b  accesses  to  the  secondary  storage  are  e  q  u  !  -  e  d 
during  the  internal  sort  phase.  So,  the  access  complexity 
of  the  internal  sort  phase  is 

0(  3*b  ). 

Since  there  are  (3*b)  blocks  at  the  controller,  log(3*b) 
will  be  the  number  of  passes  over  data  for  merging.  Each 
pass  will  require  (3*b*r  -  1)  comparisons.  So,  the 
computing  complexity  for  the  merge  phase  will  be 
( log ( B*b ) * ( B*b*r-  1 ) )  ,  which  is  in 

0(  B*b*r  *fiog  (  3*b)l  ). 

2*B*b*log  (B*b)  accesses  to  the  secondary  storage  are 
required  for  merging,  so  the  access  complexity  of  the  merge 
phase  is 

0(  B*b*flog  ( B*b)l  ). 

Therefore,  the  worst-case  computing  complexity  for  the 
sort  function  is 
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0(  3*b*r«( log  ( 3*b*r )  ) ,or 
0(  H  *  log  N  ), 

and  the  access  complexity  is 

0(  B*b*log(3*b)  ),  or 
0(  ( M/r ) * ( log ( M/r )  ). 

In  this  case,  since  all  sorting  and  merging  is  done  '  y 
one  processor,  the  controller,  the  effective  complexity  and 
the  linear  complexity  are  the  same. 

9.  THE  BACKENDS  PERFORM  THE  SORT  FUNCTION 

Here  we  consider  two  strategies.  In  the  first,  all  of 
the  backends  share  the  internal  sort  phase,  and  the  merge 
phase  is  performed  by  one  or  two  backends.  In  the  second, 
each  backend  sorts  and  merges  the  blocks  of  data  resident  at 
that  backend.  The  backends  then  share  the  work  of  merging 
with  B/2  backends  performing  the  first  partial  merge,  3/4 
backends  performing  the  next  partial  merge,  etc.  L at  js 
examine  each  of  these  strategies  in  datail. 

1 .  All  Backends  Sort ,  and  One  or  Two  Backends  Merge 

In  this  alternative  all  backends  perform  the 
internal  sort  phase  individually.  After  the  internal  sort 
phase  is  complete,  one  or  two  predetermined  backends 
complete  the  process  by  merging  all  of  the  sorted  blocks. 

So  each  backend  sorts  b  blocks  of  r  records.  The 
computing  complexity  of  the  internal  sort  phase  at  one 


backend  is 

0 (  o*r*log  r  )  , 

and  2*b  accesses  to  the  secondary  storage  are  required,  so 
the  access  complexity  of  the  internal  sort  phase  is 

0(b). 

This  is  the  effective  complexity  for  sorting.  Since  the  work 
of  sorting  is  shared  among  the  backends,  we  use  this 
effective  complexity  in  our  analysis. 

Next,  the  sorted  blocks  of  records  must  be 
transmitted  along  the  broadcast  bus  to  the  one  or  two 
backends  which  will  perform  the  merge  phase .  Let  us  take 
the  case  where  one  backend  does  all  the  merging.  Then,  if 
there  are  9  backends,  (3—1 ) * b  blocks  must  be  transmitted. 
The  communication  complexity  is 

0(B*b) . 

Also,  8*b  accesses  to  the  secondary  storage  are  required  to 
store  the  transmitted  blocks  at  the  backend  assigned  to 
perform  the  merge  phase.  This  requires  the  access  complexity 

0(  B*b  ). 

The  backend  selected  to  perform  the  merging  now  has 
(B*b)  blocks.  Merging  (B*b>  blocks  at  the  backend  requires 
the  time  of  ( B*b*r- 1 ) *log  (B*b).  The  computing  complexib /  is 


28 


7T7 


r.: 


. \l r-m '.t --a  1~.'  ■  V 


T*" 


t 

i 

1 


3(  3*b*r  4~log  C  B*b  )1  )  , 

since  2*3*b*log  (3*b)  accesses  to  the  secondary  storage  are 
required,  the  access  complexity  of  tne  merge  phase  at  this 
backend  is 

0(  3*b*log  (  3 * b )1  )  . 

Therefore,  the  total  computing  complexity  for  the 
sort  function  is 

0(  b*r*  ( log  r  +  3*log  (3*b)1)  ), 
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the  access  complexity  is 

0(  b»B*log  ( 3*b )  )  , 

and  the  communication  complexity  is  0(  3*b  ). 

2 .  All  Backends  Sort  Separately  and  Share  Merging 

In  this  strategy,  all  the  backends,  as  ii  *he 
previous  section,  share  the  work  of  sorting.  Therefore,  the 
computing  complexity  of  the  internal  sort  phase  is,  again, 
the  effective  complexity, 

0(  b*r*log  r  }  , 

and  the  effective  access  complexity  is  0(  b  ). 

Then,  each  backend  performs  the  merge  phase  over  its 
own  b  blocks.  This  requires  the  computing  complexity  of 
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and  the  access  complexity  of 

0(  b*log  b]  )  . 


Next,  the  merge  phase  is  shared  by  the  backend 
the  manner  shown  in  Figure  4.1.  First,  3/2  backends  pe 
a  merge  pass,  each  merging  2*b  blocks.  Then  3/d  ;  =  : 
perform  a  merge  pass,  each  merging  4  *  p  blocks.  This  o- 
is  repeated  log  3  times.  Mow  let  us  lock  at  the  come 
complexity  of  the  merge  phase  step  by  step. 


1.  step  (  2*b*r  -  1  )*  log  (2) 

2.  step  (  4*b*r  -  1  )*  log  (2) 

3.  step  (  3*b*r  -  1  )*  log  (2) 

4.  step  (  16*b*r  -  1  )*  log  (2) 

5.  step  (  32*b*r  -  1  )*  log  (2) 


i  S 

flog  B|  step  (  2  J  *b*r  -  1  )*  log  (2) 

The  expression  for  the  computing  complexity  of  the 
phase,  then,  as  derived  from  the  above,  is 

! 

0(  b*r  *  (  2  )  ). 


Again,  this  is  the  effective  complexity. 
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At  each  step,  each  target  backend  first  stores  the 
blocks  transmitted  from  their  neighbor  backends  before  tne 
merge  phase  starts.  This  requires  the  access  complexity 

0( 

Since  the  merge  phase  is  performed  in  log  3  steps, 
the  access  complexity  of  the  merge  phase  is  derived  as  the 
following . 

1.  step  (  2*0  ) *  1 o g ( 2 ) 

2.  step  (  4*b  )*log(2) 

3.  step  (  3*b  ) *  log ( 2 ) 

•  • 

•  • 

r,  j  * 

log  3  step  (2^  *b  )*log(2) 

Therefore,  the  effective  access  complexity  of  sharing  the 
merge  phase  for  this  alternative  is 

rU,&| 

0(  b*(  2  J  )  ). 

At  each  step,  one  half  of  the  total  number  of  blocks  must  be 
transmitted  over  the  broadcast  bus  to  the  target  backends 
for  the  next  step.  Since  there  are  log  3  steps,  the 
communication  complexity  between  the  backends  is 

0(  B*b*  I  log  sl  ) . 


C.  THE  CONTROLLER  AND  THE  BACKENDS  SHARE  THE  SORT  FUNCTION 
We  examine  two  strategies  for  distributing  the  sorting 
function  between  the  controller  and  the  backends.  The  first 
strategy  is  that  the  backends  perform  only  the  internal  so^t 
phase,  and  the  controller  performs  the  merge  phase.  The 
second  strategy  is  that  the  backends  perform  the  internal 
sort  phase  and  a  partial  merge,  merging  all  of  the  records 
in  the  blocks  stored  at  that  backend,  and  the  controller 
completes  the  merge  process.  Let  us  examine  each  of  tr.e 
strategies  in  detail. 

1 .  Backends  Sort  Slock  “b 1  -Block  and  Controller  Verges 
Every  backend  performs  the  internal  sort  phase  on 
its  part  of  the  file.  Each  block  is  sorted  and  forwarded 
directly  to  the  controller  for  merging.  The  time  complexity 
for  the  internal  sorting  of  a  block  is  0(  r  log  r  ),  where 
there  are  r  records  in  a  block.  The  effective  computing 
complexity  of  the  internal  sort  phase  is 

0(  b*r*log  ,r  )  , 

where  there  are  b  blocks  per  backend.  2*b  accesses  to  the 
secondary  storage  are  required,  so  the  effective  access 
complexity  is 

0(b). 

The  sorted  blocks  are  sent  to  the  controller  via  the 
broadcast  bus.  This  communication  cost  is  included  in  the 
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cost  of  3  P  Z  T  3 1  £  7 1  operation,  and  is  not  an  overhead  cost 
for  sorting.  However,  the  controller  must  store  !3*b) 
clocks  before  the  merge  pnase  starts.  This  requi-es  tr. e 
access  complexity 


3 (  3*b  )  . 

The  controller  now  has  2 * b  blocks  to  be  merged.  The 
computing  complexity  for  a  2- way  merge  is  ( 1 og i 2*b ; *  (  2* o * r - 
1)),  w.nich  is 

0 (  3*b*r *  log  (  3  *  b )  ). 

2*3*b*log ( 3*b )  accesses  to  the  secondary  storage  are 
required,  so  the  access  complexity  for  the  merge  phase  is 

3 (  3»b*log  (3*o)  ). 

So, the  computing  complexity  for  this  alternative  is 

0(  b*r*log  r  +  B*b*r*  log(B*b)1  )  or 
0(  b*r*(  log  r  +  log(B*b)l  )  ), 

and  the  access  complexity  is 

0(  b*B*  ilog(B*b)l  ). 

2 •  The  Backends  Sort  and  Perform  a  Partial  Merge ,  and 
the  Controller  Performs  the  Final  Merge 
In  this  case  every  backend  sorts  its  part  of  the 
requested  file,  and  the  controller  merges  those  partially 
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sorted  file  parts  being  sent  from  every  backend.  Since  tne 
backends  perform  the  internal  sort  pnase  block-by-oloc.-c,  tne 
effective  computing  complexity  of  internal  sort  pnase  is 

0 (  b‘r*log  r  )  . 

Assuming  that  every  internally  sorted  block  is  stored  oac  k 
into  the  secondary  storage,  the  access  complexity  is 

0(b). 


Mow  each  backend  merges  the  sorted  blocks  resident  at  that 
backend.  The  number  of  passes  over  the  data  required  ft’* 

f 

the  merge  phase  is  log  b.  'Therefore,  the  effective 
computing  complexity  of  merging  b  blocks  at  the  backend  is 
( b*r- 1 ) *  log  b ,  or 

OC  b*r*  flog  b]  )  , 
and  the  access  complexity  is 

‘  0(  b*  ilog  b] ) . 

So,  the  computing  complexity  for  the  internal  sort 
and  merge  phases  at  the  backends  is 

0(  b*r*  (log  r  +  log  bj)  ), 
and  the  access  complexity  is 


0(  b*  log  b I  )  . 


Communication  of  these  blocks  to  the  controller  is,  aga 
not  a  part  of  tne  sorting  cost.  However,  since 
transmitted  blocks  are  to  be  stored  before  the  merge  on 
starts,  this  requires  the  access  complexity 

Q(  3*b  ). 

The  controller,  now,  will  have  3  runs  of  so- 
records  to  be  merged.  The  logarithmic  value  of  the  number 
backends  gives  the  number  of  passes  over  data,  log  3. 
the  computing  complexity  of  tne  merge  phase  an 

controller  is 

0 (  3*b#r* 1  log  3]  )  , 
and  the  access  complexity  is 

0(  3*b*  log  b)  ). 

0.  EVALUATING  THE  ALTERNATIVES 

In  the  previous  sections  we  have  presented  f 
alternative  distributions  of  funtionality  between 
controller  and  the  backends.  In  this  section  we  w 

analyze  the  tradeoffs  of  the  alternatives.  Table 

summarizes  the  computing  complexities  of  the  internal  s 
and  the  merge  phases,  the  access  complexity,  and 
communication  complexity  for  all  five  alternatives. 


COMPUTING  COMPLEXITY  COMMUNICATION 


Table  1  The  Time  Complexities  for  the  Alternative  Distributions 
of  the  Sort  Function 


Alternative  A  represents  the  distribution  of  function 
presented  in  Section  A  of  this  chapter-.  The  controller 
performs  ail  of  tne  sorting  and  all  of  the  merging. 
Alternative  3.1  represents  the  distribution  p resented  in 
section  3.1  of  this  chapter.  All  of  the  backends  perform  the 
sorting  ,  and  one  or  two  backends  perform  the  merging  of  „i.e 
sorted  blocks.  Alternative  3.2  represents  tne  di st- i^ut ■' on 
of  function  presented  in  Section  3.2  of  this  c h a p t e r  .  All 
backends  perform  the  sorting  and  share  the  merging. 

Alternative  S.i  represents  the  distribution  of  function 
presented  in  Section  2.1  of  this  chapter.  Ail  the  backends 
perform  the  sorting  and  the  controller  performs  the  merging. 
Finally,  alternative  C.2  represents  the  distribution  of 
function  presented  in  Section  C.2  in  this  chapter.  3ackends 
sort  and  perform  a  partial  merge,  and  the  controller 
performs  final  merge. 

The  complexity  f ormulas  of  those  both  accesses  to  the 
secondary  storage  and  block  transmission  are  given  only  for 
the  additional  accesses  or  tr ansmissions  necessary  to 
complete  the  sort  function.  In  other  words,  accesses  to  the 
secondary  storage  to  retrieve  the  records  in  order  to 
perform  selection  and  projection  before  the  sort  function 
starts,  and  transmission  of  the  blocks  from  the  backends  to 
the  controller  are  not  included.  In  general,  each 
alternative,  except  A,  has  the  same  time  complexity  with 
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regard  do  the  internal  sort  phase.  Therefore,  we  will  focus 
on  the  other  columns  in  comparing  the  alternatives. 

First,  let  us  examine  alternative  A,  where  the 
controller  performs  all  sorting  and  merging.  The  computing 
complexity  is  0(3*b*r*  log(3*b)  )  for  sorting  and  merging 
( 3 * b )  blocks  of  r  records.  As  easily  seen,  this  alternative 
is  contrary  to  the  design  goal  of  the  minimizing  contr"1 1 er 
function.  Therefore,  we  will  eliminate  it  from  further 
considerations. 

Next,  let  us  examine  alternative  3.1,  where  ail  backer  is 
perform  the  sorting  and  one  or  two  backends  perform  toe 
merge.  The  backends  perform  all  of  the  work  of  sorting  and 
merging.  Even  though  this  alternative  seems  to  meet  the 
design  goal  of  minimizing  the  controller  function,  it  is 
contrary  to  the  second  design  goal  of  minimizing  the  message 
traffic  between  the  backends.  The  communocat ion  complexity 
is  0(B*b)  for  ( B * b )  blocks.  Clearly,  for  queries  involving 
a  large  number  of  blocks,  the  communication  overhead  will  be 
high  -and  bus  congestion  may  result.  Another  disadvantage  is. 
that  a  single  backend  performs  the  merging.  Also,  when  the 
single  backend  is  performing  the  merging,  it  may  delay  the 
processing  of  other  queries,  thus  causing  a  decrease  in 
system  throughput.  Because  of  the  communication  overhead 
and  the  potential  for  decrease  in  throughput,  we  will  also 
eliminate  alternative  3.1  from  further  cons  ider at  ion . 
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Next,  we  consider  alternative  3.2,  where  all  backends 
share  the  sorting  and  the  .merging.  The  communication 
complexity  is  3(3*b*log  3)  for  (3*b)  blocks.  The 
commun icat ion  complexity  increases  logar i thmi cal  1 y  with  3, 
the  number  of  backends.  Clearly,  this  alternative  is  also 


contrary  to 

des  i 

gn 

goal 

of  minimizing  the  message 

between  the 

b  a  c ' 

<  0  n 

d  s  . 

Also  , 

,  the  computing  complexity 

merge  phase 

3n  d 

th 

e  access 

complexity  increase  ex  pen 

by  log  3,  where  3  is  the  number  of  backends.  Clearly,  wide 
this  alternative,  increasing  the  number  of  backends  will 
cause  longer  response  time  and  decreased  through? jt .  3c,  we 
will  not  consider  3.2  to  be  a  desirable  distribution  of 
function . 


This  leaves 

us  with 

alternatives  C .  1 

and  C.2. 

Alternative  C.1 

is  that 

the  backends  perform 

the  s  )  r c  i  r  g 

block-by-block  and  the  controller  merges  all  the  blocks. 
Alternative  C.2  is  that  the  backends  perform  the  sorting  and 
a  partial  merge,  and  then  the  controller  performs  the  final 
merge.  Neither  alternative  incurs  transmission  overhead. 
Therefore,  the  design  goal  of  minimizing  the  bus  traffic  is 
met . 

In  both  alternatives  the  work  of  sorting  and  merging  is 
shared  between  the  backends  and  the  controller.  Alternative 
C . 1 , however , does  involve  more  work  for  the  controller  th_n 
alternative  C.2.  Since  the  backends  perform  the  ,.»d’n 
portion  of  the  merge  process  in  C.2  the  controller's  work  is 
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reduced  . 


On  the  other  hand,  the  workload  of  the  backends  is 


i 


f- 


greater  with  alternative  C.2,  than  with  alternative  C.l. 
Let  us  analyte  these  two  alternatives  with  respect  to  the 
design  goals  of  minimizing  the  controller  function  ^nc 
maximizing  the  work  done  by  the  backends  in  the  next 
section  . 

E.  COMPARISONS  3 £7 VEEN  ALTERNATIVES  C.l  AND  C.2 

In  this  section  we  will  compare  the  two  alternatives, 
namely  C.l  and  C.2.  In  comparing  these  two  alternatives,  we 
will  analyte  computing  complexity  and  access  complexity 
separately.  Since  the  time  to  do  one  disk  access  is  much 
longer  than  the  CPU  time  to  perform  one  comparison,  separate 
analyses  will  be  more  meaningful. 

As  is  shown  in  Table  1  ,  the  internal  sort  nhdsr 
computing  complexity  is  the  same  for  both  alternatives. 
However,  with  alternative  C.2,  the  backends  perform  a  part 
of  the  merge.  Consider  that,  for  both  alternatives,  if  the 
number  of  blocks  is  held  constant,  increasing  the  number 
backends  will  cause  the  number  of  blocks  to  be  sorted  at  one 
backend,  b,  to  decrease.  This  decrase  is  linear  work  respect 
to  the  number  of  backends.  Therefore  the  computing 
complexity  on  the  backends  decreases  linearly  with  an 
increasing  number  of  backends. 

However,  meeting  the  amount  of  work  done  by  the 
controller’  function  is  clearly  less  with  alternative  C.2 
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than  with  alternative  C .  1  ,  due  to  the  fact  that  the  backends 
offload  some  of  the  work  of  merging  from  the  controller. 
Consider  the  case  where  the  total  number  of  recors  'N=3*b*r) 
is  held  costant.  The  computing  complexity  of  alternative  C .  ‘ 
will  not  vary  with  the  number  of  backends.  For  the 
alternative  C.2,  the  computing  complexity  of  the  merging  at 
the  controller  will  increase  logarithmically  with  the  n urn be- 
of  backends.  however,  the  computing  complexity  for  merging 
at  the  controller  will  always  be  less  for  the  alternative 
C.2  than  C.1  by  a  factor  of  (3*o*r*log  b>.  Since  b  iec^eases 
as  3  increases,  the  gain  will  be  proportionately  smalle-  as 
3  grows  large.  Clearly,  however,  the  alternative  C.2  better 
fits  the  goal  of  minimizing  the  controller  function. 
Clearly,  a  substantial  reduction  in  the  controller  workload 
will  result  from  assigning  more  functionality  to  the 
backends . 

Now  let  us  examine  the  effect  of  increasing  the  number 
of  backends.  We  will  analyze  the  computing  complexities, 
access  complexities,  and  communicat ion  complexities  of  both 
alternatives.  Let  us  examine  the  case  that  the  total  number 
of  records,  N=B*b*r,  and  the  number  of  records  per  blocks, 
r,  remaining  constant,  while  the  number  of  backends,  3, 


increases 


"or  alternative  C.1,  the  total  computing  complexity  j.l 
b*r*log  r  +■  3*b*r*log(  3*b)  ,  or 

M*(  log  M  -  ((9-1)/3)*log  r  ),  since  b=N/(3*r). 

This  obviously  yields  decreasing  results  for  increasing 
values  of  3.  This  reduction , however , will  have  minor  effect 
on  the  result  of  the  computing  complexity.  Further,  the 
reduction  can  be  ignored  for  large  M  values. 

For  alternative  C.2,  the  total  computing  complexity  is 

b * r *  1  o g  r  b * r *  1  o g  b  +  3*b*r*log  3  ,  or 

N * (  ( 1 / 3 ) *  log  N  +  ( (3-1 )/3)*log  3  ),  since  b  =  N/(3*r). 

(1/B)*log  M  obviously  decreases  with  the  increasing  3. 
However,  ( ( B- 1 ) / B ) *  log  3  increases  for  increasing  3  values. 
There  is  some  breakpoint  for  where  the  effect  of  the 
decreasing  term  has  more  effect  than  the  increasing  term. 
Let  us  assume  that  we  double  the  number  of  backends.  The 
difference  in  the  total  complexities  between  the  case  that 
the  backends  are  not  doubled  and  the  case  that  do  double  is 

N*(  ( 1/(2*3) )*log  N  -( 1/(2*B) )*log  B  -( (2*B-1 ) / (2*3) )  ). 

As  long  as  the  condition,  log  M  >  log  3  +  (2*B-1),  holds, 

the  total  complexities  will  be  reduced.  So,  if  the  condition 

U8-1 

N  >  B*2  holds  after  doubling  the  number  of  backends,  then 
the  computing  complexity  decreases. 
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.  Mow,  let  us  examine  the  access  complexity.  "or  the 
alternative  C.1,  the  total  access  complexity  is 

b  +•  3*b*log(3*b),  or 

N  * (  ( 1 / ( 3  *  r ) )  +  1  o  g  (  N  /  r  )  ),  since  b  =  M / ( 3  * b  )  . 

Clearly,  this  complexity  decrease  as  3  increases.  However, 
the  decrease  still  nas  minor  effect,  especially  for  a  lar5e 

M  . 

The  access  complexity  for  alternative  Z.2  is 
b  *  1  o  g  b  +  3  *  b  *  1  o  g  3,  or 

( M/r ) »(  ( 1 / 3 ) * ( log  M  -  log  r)  +  ( ( 3- 1 ) / 3 ) * lo g  3  ). 

Again  ( 1 /B ) * ( log  M  -  log  r)  decreases  ,  but  ( ( 3- 1 ) /3 ' *  1 og  3 
increases  as  3  increases.  Let  us  again  assume  that  we  double 
the  number  of  backends.  The  difference  in  complexities  going 
from  3  backends  to  2*3  backends  is 

(N/r)»(  ( 1 / ( 2* B ) ) * (  log  N  -log  B  -log  r  -2*3  +  1  )  ). 

As  long  as-the  condition,  log  N  >  log  B  +  log  r  +  2*3—1 , 

holds,  the  total  access  complexity  decreases.  So,  if  the 

i8-1 

condition  (N/r)  >  B*2  holds  after  doubling  the  number  of 
backends,  then  the  total  access  complexity  decreases. 


Figure  4.2  illustrates  the  computing  3  n  d  access 

3  r 

complexities  for  both  alternatives  for  M  =  2  and  '1-2  a  re 
r=64  as  3  increases.  As  is  easily  seen,  alternative  2.2  is 
always  better  than  alternative  2.1  for  meeting  tne  design 
goals  of  MD3S. 

F.  RECOMMENDED  DISTRIBUTION  OF  FUNCTIONALITY 

In  the  previous  sections  of  this  chapter  we  have 
analyzed  the  alternatives  of  the  distribution  of  the 
functionality  and  shown  te  tradeoffs  3rd  the  advantages  of 
each  one.  Briefly,  alternatives  A,  3.1,  and  2.2,  ar a 
contrary  to  the  design  and  implementation  goals  of  M23S.  The 
other  two  alternatives ,  C.1  and  C.2,  are  pertinent  f  o’*  ou- 
concer ns . 

At  each  comparison  for  alternatives  C.1  and  2.2  in  the 
previous  section,  we  have  shown  that  C.2  is  better 
alternative  for  large  number  of  records.  Therefore,  we 
recommend  that  the  functionality  be  distributed  in  the 
following  manner:  the  backends  perform  the  sorting  a_nd 
partial  merge  ,  and  the  controller  performs  the  final  merge. 


N=2l5 
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Figure  4.3.  Access  Complexity  for  Alternatives  C.l  and  C. 
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In  our  previous  analyses,  we  made  the  assumption  that 
records  are  sorted  one  block  at  a  time,  using  some  well- 
known  sorting  algorithm  with  time  complexity  of  D(r»log  - '  , 
where  r  is  the  number  of  ,-ecords  in  a  block.  In  this 
chapter,  we  will  examine  the  effect  of  sorting  -eoo^is 
blocks  at  a  time,  and  the  effect  of  using  a  k-way  tenge. 


A.  SORTING  .1/ 1 T 4  n  BLOCKS  AT  A  riME 

In  this  case,  we  sort  n  blocks  at  a  time.  There  are  two 
cases  to  consider.  First,  if  the  sorted  blocks  are  stored 
back  into  the  backend's  secondary  storage,  our  analysis  will 
be  the  same  as  the  previous  one,  except  that  the 
coefficients  of  the  computing  complexity  formulae  will  be 
proportional  to  n. 

The  computing  complexity  for  n-blocks-at-a-time  sorting 
is  0(n*r*log  n*r).  This  process  will  be  repeated  b/n  times. 
Therefore,  the  effective  computing  complexity  is  Q-(b*r*log 
n*r).  The  access  complexity  remains  the  same,  0(b).  Since 
there  will  be  b/n  runs  to  be  merged,  the  number  of  passes 
over  data  becomes  log  (b/n).  Computing  complexity  for  merge 
phase,  then,  will  be  0(  b*r *  log ( b/ n )  ).  Access  complexity 
for  the  merge  phase  is  0( b*log ( b/n)  . 


Table  2  summarizes  time  complexities  for  both  blcik-by- 
block  and  n-block- : o-a- time  algorithms.  As  is  easily  seen, 
the  computing  comp  axity  for  sorting  n-block-at-a-time 
algorithm  is  (b*r*log  n)  times  that  for  sorting  block-by¬ 
block.  However,  the  computing  complexity  for  merging  is  less 
by  (b*r*log  n),  and  the  access  complexity  is  less  by  (b*log 


Figure  5.1  shows  the  effect  of  increasing  the  number  of 
backends  on  the  throughput  of  the  CP'J  when  sorting  n-blocx- 
at-a-time.  The  x-axis  shows  the  number  of  backends,  and  y- 
axis  shows  the  computing  complexity  of  sorting  and  merging 
at  the  backends.  The  y-axis  is  shown  with  log  scale.  These 
values  were  derived  as  follows.  The  complexity  formulae  are 
expressed  in  terms  of  M ,  the  total  number  of  records  to  be 
sorted,  and  B,  the  number  of  backends.  The  computing 
complexity  required  at  the  backends  is 


( b*r *  log ( n*r  )  )  +  C  b*r  *llog  (  b/n  )"l)  or 
(N/B)*(log  N  -  log  8),  since  b=N/(B*r). 


Using  various  values  of  N,  varying  B  from  2  to  15,  we  arrive 
at  the  curves  shown  in  Figure  5.1. 


COMPUTING  COMPLEXITY  COMMUNICATION 

- r  -  '■  ACCESS  COMPLEXITY  COMPLEXITY 


'Jp  to  this  point,  we  have  utilized  2-way  merge  process 
for  our  analyses.  As  we  recall,  the  time  complexity  of  merge 
is  dominated  by  the  number  of  runs,  i.e.,  the  number  of 
blocks  which  are  already  internally  sorted.  The  logarithmic 
value  of  the  number  of  the  runs  gives  the  number  of  passes 
over  data.  In  2-way  merge,  the  number  of  passes  is  the 
logarithm  base  2  of  the  number  of  runs.  If  we  increase 
order  of  the  merge  to  <,  for  k-way  merge,  the  number  :  f 
passes  will  be  t n e  1  o g a r  i t h m  to  the  base  k  of  the  nun'*;'-  :  f 

runs. 

Let  us  examine  what  we  gain  with  this  reduced  number  of 
passes.  We  assume  that  all  ’*uns  are  of  epual  length.  The 
notation  used  is: 

R  =  the  number  of  runs  =  b/n 
log  x  =  logarithm  base  2  of  x 
LOG  x  =  logarithm  base  k  of  x 

First  of  all,  our  access  time  will  be  reduced. 

In  2-way  merge,  access  complexity  is  : 

0(  b*  log  R!  )  . 

In  k-way  merge,  access  complexity  is: 

0(  b*  LOG  Rl  )  . 

Figure  5.2  gives  us  some  information  about  the  reduction  in 
access  complexity  for  a  fixed  number  of  R,  and  a  fixed 
number  of  blocks,  b,  as  k  increases.  The  x-axis  shows  k , 
where  k  is  the  number  of  blocks  merged  at  one  time.  The 
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is  one  half  of  that  for  a  2-way  merge. 

On  the  other  hand, of  course,  increasing  '<  will  increase 
the  computing  complexity,  or  time  that  is  necessary  t: 
compare  the  values.  In  the  2- way  merge,  the  computing 
complexity  is 

0 (  b * r *  log  2  ). 

In  k_way  merge,  computing  complexity  is 

D(  b*r * ( k- 1 ) *  LOG  r]  )  or 

*  m 

0 (  b*r*(k-1)*  (log  2  /log  k)i  )  , 

since  L0G(R)=  log(R)  /  log  '.< . 

Figure  5.3  shows  the  increase  in  computing  complexity 
with  regard  to  k  for  a  fixed  R.  The  y-axi's  for.  tr  s  formula 
is  scaled  starting  with  minimum  1 ,  where  1  represents  the 
computing  complexity  for  a  2-way  merge  at  the  backend.  The 
ratio  of  k-way  merge  computing  complexity  to  the  2-way  merge 
computing  complexity  is  graphed  here.  ‘•"or  instance,  the 
4-way  merge  computing  complexity  is  1.5  times  that  of  the 
2- way  merge  computing  complexity. 


53 


As  seen  from  Figure  5  .  -1 ,  the  access  couple'7 ’tv  reduces 
rapidly  up  to  40*  of  the  2  -  way  merge  complexity  at  k  =  5  . 
However,  at  this  point  the  computing  complexity  has  doucled. 
After  this  point,  k>6  ,  the  reduction  in  access  complexity 
becomes  negligible  relative  to  the  increasing  computing 
complexity.  Therefore,  we  can  take  the  point,  k  =  c,  as  an 
implementation  point  for  the  degree  of  the  merge. 


Computing  and  Access  Complexities  of  Merging  with  various 


.new 


Up  to  this  point  we  have  not  cons i her i  how  the  existing 
features  of  MDBS  software  architecture  mi^'t  be  utilized  for 
the  sort  function.  We  :nay  ask  a  question  such  as  whether  the 
descriptor  and  cluster  information  can  be  used  to  improve 
sorting?  Another  question  is  whether  existing  I  /  ?  mechanisms 
can  be  used  to  support  the  temporary  storage  r*equire~ent  s  . 
third  question  is  whether  an  alternative  strategy  shouli  be 
adopted  when  the  number  of  records  are  not  evenly 
distributed  across  the  backends.  We  will  examine  these 
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questions  in  detail. 

1 •  Utilizing  the  Descriptor  and  Cluster  Information 

Recall  that  the  database  in  MDBS  ^  organized  into 
clusters.  Each  cluster  has  a  unique  cluster  i d  ,  and 
associated  with  a  unique  set  of  cluster  ids .  A  record 
belongs  to  one  and  only  one  cluster.  The  cluster  to  which  a 
record  belongs  is  determined  by  the  set  of  descriptor  ids 
which  can  be  derived  from  the  directory  keywords  of  the 
record. 


How  might  this  helps  us  in  sorting?  First  consider 
the  case  that  the  primary ( first-1 isted  )  attributes  in  the 
ordering  specification  are  not  directory  attributes.  In  this 
case,  the  cluster  to  which  a  record  belongs  has  no  bearing 
on  the  final  sorted  order. 

Next  consider  the  case  where  the  a^ributes  in  the 
ordering  specification  are  all  directory  attributes.  In  this 


case,  we  can  use  cluster  information  in  tne  following 
manner.  If  we  also  know  the  relative  order  of  the  descriptor 
ids  which  determine  the  clusters,  we  may  simply  concatenate 
the  records  from  the  cluster  having  the  lowest  ordeK* 
descriptor  ids  with  the  cluster  having  the  next  higher  crdei" 
descriptor  ids,  and  so  on. 

r i n  all y ,  consider  the  case  where  the  p- in  ary 
(first-listed)  attributes  in  the  ordering  specification  are 
directory  attributes,  and  the  secondary  attributes  in  the 
ordering  specification  are  non -directory  attributes. 

Let  us  first  take  a  look  at  what  we  may  need  to 
utilize  the  existing  nachanisms.  What  is  useful  for  sorting 
process  is  to  know  cluster  ids  and  consequently  the  group  of 
descripter  ids(DIDs).  The  necessary  point  is  to  know  the 
DIDs  associated  with  records.  If  the  record  process  is 
informed  with  the  DIDs  of  records  as  we1!  as  their  addresses 
and  also  the  records  are  retrieved  in  terms  of  cluster 
numbers,  that  is,  there  is  no  record  retrieved  belongs  to 
another  cluster  till  all  the  records  belonging  to  a  cluster 
are  retrieved.  This  process  guarantees  that  if  the  records 
are  going  to  be  sorted  with  an  attribute  which  is  directory 
table  attribute,  and  if  the  attribute  is  either  type_A  or 
type_B  attribute  then  none  of  the  clusters  will  have  a 
record  with  the  same  attribute  value.  We  also  need  another 
process  to  define  which  cluster  has  less  or  larger  value  of 
attributes.  This  process  needs  to  check  DIDs  of  clusters 
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from  descr  i.ptor-to-descriptor-id  table  and  gives  a  list  of 
OIDs . 

We  may  consider  the  utility  of  the  above  oases. 
First  of  all,  we  need  to  implement  three  different 
algorithms  to  handle  these  three  different  cases.  Second, 
probabilty  that  primary  sort  specification  attributes  are 
directory  attributes  is  unknown. 

Let  us  assume  that  the  system  will  be  augmented  with 
the  implementation  of  the  cluster  information.  In  that  case, 
modifications  to  MDBS  are  to  be  done.  Recall  that  the  reco-d 
processing  knows  only  the  addresses  of  the  '■ecords  to  be 
retrieved.  Therefore,  record  processing  is  to  be  informed 
with  not  only  cluster  info  but  also  descriptor  information 
from  directory  management,  including  relative  ordering  of 
clusters  based  on  descriptor  ids.  We  do  not  have  a 
mechanism  available  to  support  the  idea.  On  the  other  hand, 
this  implementation  violates  the  information-hiding 
principles  upon  which  directory  management  designed. 

2 .  Utilizing  Existing  Mechanism  f or  Storing  Tempor ar v 

Data 

In  the  previous  sections  we  have  assumed  that  the 
system  was  providing  the  temporary  storage  requirements  for 
the  sort  function.  We  have  not  considered  about  how  this 
might  be  accomplished.  We  know  that  system  allocates  tracks 
as  required  for  new  clusters  or  for  extending  existing 
clusters.  Therefore,  we  know  that  there  exists  a  mechanism 
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for  allocating  storage.  The  difficulty  lies  in  t.nat  the 
allocation  is  related  to  the  concept  of  a  cluster,  a nd  not 
to  a  "block”  of  data. 

In  order  to  use  the  existing  mechanises,  then,  we 
must  establish  some  relationships  between  blocks  of  sorted 
data  and  clusters.  Since  we  are  sorting  block-at-a-time ,  we 
initially  need  to  establish  as  many  temporary  clusters  as  we 
have  blocks  of  data.  Then,  with  each  succtsive  pass  of  tne 
merge  algorithm,  we  will  require  m1 y  half  the  previous 
number  of  clusters,  although  the  total  space  require: 
remains  the  same. 

In  current  MDBS,  storage  is  allocated  only  in  the 
case  of  an  insert  request,  where  the  records  is  to  be 
inserted  into  an  already-full  cluster  or  a  new  cluster  is  to 
be  established.  The  list  of  available  (free)  secondary 
storage  addresses  is  maintained  by  directory  management,  'lew 
addresses  for  new  clusters  are  assigned  during  the  address- 
generation  phase. 

The  second  consideration  ’is  th-t  addresses  are 
associated  with  specific  clusters,  and  new  cluster  ids  are 
assigned  only  by  the  controller.  The  third  considerat ion  is 
that  records  are  inserted  record-at-a-time ,  based  on  an 
insert  request.  For  the  sorting  process,  we  wish  to  write 
blocks  of  records. 


In  order  to  jse  the  existing  mechanisms,  we  muse 
modify  MDBS  so  that 

(1)  The  sort  process  can  request  a  new  temporary  cluster. 
This  may  involve  sending  a  message  to  the  controller. 

(2)  Directory  management  can  generate  addresses  as  required 
for  the  temporary  clusters. 

(3)  Record  processing  can  insert  Plucks  of  r ec ord s  as  well 
as  single  record. 

(4)  Temporary  , clusters  and  their  storage  can  oe  f~eed  when 
no  longer  needed. 

This  is  a  disadvantage  due  to  extensive  modifications. 

As  an  alternative,  we  may  consider  the  following 
case.  Reserve  a  certain  number  of  addresses  as  temporary 
storage  at  system  setup  time.  'Jse  these  addresses  and  the 
low-level  read  and  write  functions  of  record  processing  for 
temporary  storage. 

3 •  The  Case  that  Records  are  not  Evenly  Distributed 
Across  the  Backends 

Our  time  complexity  formulas  reflect  the  perfect 
conditions  for  distribution  of  the  records  which  are  to  be 
sorted.  They  do  not  give  the  correct  results  for  the 
condition  that  one  backend  contains  all  the  records  to  be 
sorted  and  the  other  backends  do  not  contain  any  records.  In 
such  a  case  there  are  two  alternatives  to  be  c-onsidered.  The 


first  is  that  the  backend  having  the  -  e  c  o  r  d  s  performs  t.ne 
sort  function  without  redistributi  of  records.  The  second 
is  that  the  records  are  redistr  pouted  evenly  among  the 
backends.  In  the  following  sections  we  will  examine  these 
two  alternatives  in  detail. 


a.  The  Backend  Performs  the  Sort  Function 

Assuming  that  the  algorithm  in  Chapter 


section  C.2  has 

been 

selected 

for  implv.ner. ting 

sort  functi 

in  MDBS,  we  will 

cal 

culate  the 

time  c - mplex  i  t 

i e  s  for 

sort  f unction . 

The 

backend 

now  contains  (  9  *  b 

)  blocks.  S 

the  internal  sort  phase  time  complexity  is 

0  (  3*b*r*log  r  )  , 

and  2*9*b  accesses  to  the  secondary  memory  are  required 
which  is 

0  (  3*b  ). 

The  merge  process  requires  the  time  0(3*b*r*  log  (3*b);  ) 

with  the  access  time  to  the  secondary  memory  0(3*b*  log 
(  B*b )]  )  . 

Therefore,  the  sort  function  time  complexity  is 
0  (  B*b*r*(log  r  +■  log  ( B  *  b )  ()  ), 
and  the  required  accesses  to  the  secondary  memory  are 


0  (  8*b*  ilogCB*b)l  ) 


The  Records  are 


A  m  o  r.  g 


distributed 


-  v  e  n  . ' 


Other  3ackends 

In  this  alternative  the  backends  should  inf o-m 
the  controller  if  thay  do  not  have  any  records  to  be  sorted. 
The  controller  then  manages  the  transfer  of  the  -ecords  from 
one  backend  to  the  other  backends. 

Since  one  backend  contains  (3*b)  blocks,  o'.r-1' 
records  are  to  be  transmitted  to  th<_  other  backends.  This 
requires  communication  time  of  O'"  3*t )  .  The  backends  row 
contain  equal  number  of  blocks,  o.  The  time  complexities 
now  be  calculated  as  in  the  Chapter  IV  section  C.3. 

The  internal  sort  process  time  is  D(b*r* log  r). 
The  merge  process  time  at  the  backends  is  D(  b*r*log  o  '. 
Accesses  required  at  the  backends  are  0  (  b*(1+log  b)  ). 

Depending  on  the  average  time  required  to 
transmit  a  block  from  a  backend  to  another  backend,  we  can 
analyze  the  difference  between  the  aforementioned 
alternatives.  At  this  moment  we  do  not  know  the  value  of  the 
transmission  time  a  block.  Clearly,  there  are  some  cases  in 
which  the  transmission  is  not  cost-effective. 


A.  TERMINOLOGY  AND  NOTATION 

First,  let  us  define  some  terminology.  A  join  involves 
two  relations,  the  source  relation  and  the  target  relation . 
The  join  is  formed  over  an  attribute  (or  attributes)  that 
belong  both  to  the  source  relation  and  to  the  the  target 
relation.  We  will  call  these  the  source  attr ibute( s )  3nd  the 
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attributed)  must  be  the  sane  as  the  domains  of  the  target 
attribute(s)  . 

There  are  many  types  of  joins.  "irst  we  examine  the 
natural  join.  Let  us  use  an  example  to  illustrate  toe 
natural  join.  The  relations  participating  in  a  natural  join 
are  given  in  Figure  5.1.  a.  Relation  S,  the  source  -elation  , 
consists  of  three-tuples  of  attributes,  A,  3,  ant  J. 
Relation  T,  the  target  relation,  consists  of  three- tuples  of 
attributes,  3 ,  3,  and  3.  The  assumption  is  mace  that 
attributes  having  the  same  name  are  defined  ever  the  same 
domain  of  values.  Thus,  the  attributes  8  and  C  in  relation 
3  are  assumed  to  be  drawn  from  the  same  domain  of  values  as 
the  attributes  3  and  C  in  relation  T.  "  i  g  u  r  e  5.1  shows  the 
cross  product  SxT  of  relations  S  and  T,  SxT.  SxT  is  formed 
by  concatenating  each  tuple  of  relation  3  with  every  tuple 
of  relation  T. 


The  natural  join  is  formed  in  two  steps.  -irst  select 
from  SxT  the  tuples  such  that  the  values  of  both  columns 
headed  by  8  and  both  columns  headed  by  C  are  the  same. 
There  are  three  such  tuples,  the  first,  fifth,  and  ninth 
shown  in  Figure  6.1  .(b).  The  second  step  is  to  project  from 
those  tuples-  one  column  for  each  distinct  attribute.  The 
result  relation,  3 ! X ! T ,  is  shown  in  Figure  5.1.(c). 


3  and  7  where  attribute  values  of  3  and  Z  ir.  3  are  identical 
to  the  attribute  values  of  3  and  7  in  7,  i.e.,  3. 3  =  7. 3  and 
S.CsT.C.  When  the  equal  comparison  operator  is  used,  the 
join  operation  is  called  an  e-ouality  join.  When  any  other 
comparison  operator  is  used,  the  join  operation  is  called  an 
inequality  join .  7he  join  operation  is  associative,  so  that 
more  than  two  relations  may  be  joii.ed.  ror  example  the  join 
of  three  relations  3,  7,  and  7,  is  ‘■’ne  same  as  the  joi”  of  3 
and  T,  and  the  join  of  'J  and  the  first  join. 

7here  are  a  variety  of  join  algorithms.  7he  simplest  is 
the  straightforward  or  nested-loops  join.  7he  algorithm  is 
shown  in  Figure  5.2. 


For  each  tuple  in  the  source  relation  do 
For  each  tuple  in  the  target  relation  do 
If  the  join  condition  holds  true  then 

form  a  result  tuple 


Figure  6.2.  Str  aight  <’orw»rd  Join  Algorithm 

In  the  chapters  which  follow,  we  will  simplify  our 
analysis  by  assuming  that  join  operations  are  restricted  to 
equality  joins  over  a  single  source  attribute  and  a  single 
target  attribute.  The.  terms,  source  relation  and  target 
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relation,  refer  to  the  files  participating  in  a  join 
operation  in  MDBS.  Hence,  the  source  file  refers  to  a  source 
relation,  and  the  target  file  refers  to  a  target  -elation. 


We  will  also  adopt  the  following  notation: 

Cs  :  The  number  of  records  in  a  source  file 


I 


Ct  : 


n 


m 


T 


log 


The  number  of  records  in  a  target  file 

The  number  of  blocks  belonging  to  a  source  file  at 

a  backend,  Cs/(3*r). 

The  number  of  blocks  belonging  to  a  target  file  at 
a  backend  ,  Ct/ ( 3*r )  . 

Quotion  of  the  cross-pr od"ot  of  a  source  file  an: 
a  target  file  which  participate  in  a  join 
operation . 

Logarithm  to  the  base  2. 


3.  ASSUMPTIONS 

In  analyzing  the  alternatives  of  the  distributions  of 
the  join  function,  we  make  the  following  assumptions. 

1)  The  source  and  target  records  are  distributed  equally 
across  the  backends. 

2)  The  join  operation  is  an  equality  join  over  a  single 
source  attribute  and  a  single  target  attribute. 

3)  The  join  function  is  performed  after  the  retrieval  and 
selection  operations  specified  in  the  request  have  been 
performed. 
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used  to  perform  the  join. 

5)  Accesses  to  the  secondary  storage  are  oar-iei 
block-by-block . 

6)  The  source  and  target  files  do  not  contain 
duplicate  records  (i.e.,  after  retrieval  of  the  r-ee 
which  are  to  participate  in  the  join  operation,  the-e 
no  two  identical  records  in  the  source  file  c -  in 
target  file),  Therefore,  there  is  no  record  elinina 
process  from  tne  files. 

C.  A  SYNTAX  FOR  THE  JOIN 

In  this  section,,  we  will  give  a  syntax  for  a  2-way  j 
MDBS  utilizes  an  attribute-based  data  language,  A3QL, 
user  queries.  Indeed,  an  A3DL  can  be  used  for  any  data 
applications  as  a  kernel  language  of  any  kind  of  data 
machines.  Current  database  application  language  queries, 
instance,  SQL,  can  be  be  mapped  to  A3DL  requests. 

Using  ABDL,  a  2-way  equality  join  request  is  shown 
the  following . 

RETRIEVE  ( attr ibute_l ist_1 )  (query_1) 

CONNECT  ON  ( attr ibute_1 ,  attribute_2) 
(attribute  list  2)  (query  2) 


The  RETRIEVE  clause  :".:.:es  mat  r,;,s  « 

attribute-value  pairs  given  in  attr ibute_list_1  satisfy 
conditions  given  in  query_1 ,  3nd  the  records  w 

attribute-value  pairs  given  in  attr ibute_l ist_2  satisfy 
conditions  given  in  query_2,  are  extracted  fro-n 

database.  Let  R1  and  R2  be  the  two  different  f 

containing  these  records,  respectively.  The  Cl'IdE: 

clause  specifies  the  join  on  the  relations  Ri  and  ?? 
the  attributes  attribute_1,  which  is  (implicitly  if 
explicitly)  in  attr ibute_l ist_1 ,  and  attr ibute_2 ,  whic 


in  attribute  list  2. 


j  o :  m 


VII.  THE  ALTERNATIVE  DISTRIBUTIONS  2l  THE 

In  analyzing  the  alternative  distributions  of 
function,  we  will  ag3in  consider  three  d 
possibilities. 

A.  The  controller  performs  the  join  function 
3.  The  backends  perform  the  join  function 
C.  The  join  function  is  shared  by  the  controller 
backend s  . 

'v'e  will  examine  each  of  these  alternatives  in  detail 
following  sections. 

A.  THE  CONTROLLER  PERFORMS  THE  JOIN  FUNCTION 

In  this  alternative,  the  backends  perform  the  r 
of  the  records  which  will  participate  in  the  join  op 
These  records  are  then  sent  to  the  controller, 
controller  performs  the  join. 

Since  each  backend  contains  n  source  file  blocks 
target  file  blocks,  the  communication  complexity  is 

0(  B* ( n+m)  ) ,  or 
0(  ( Cs+Ct ) / r  ). 

Then,  storing  these  blocks  in  the  secondary  storage 


controller  has 


access  complex  icy. 

After  receiving  the  "  e  c  o  r  d  s  f r o n  a  1 


c  a c  '< e  r  d  s  ,  tn  • 


controller  now  has  C  3  *  r. )  source  file  blocks  and  ( 3  *  n '  target 
file  blocks.  'Jsing  the  straightforwar d  join  algorithm,  each 
record  in  the  source  file  is  cot  oared  with  each  reco-d  1-'. 


the  target  file  in  order  to  form  the  join.  This  re : j i ~e : 
(Cs*Ct)  comparisons.  So,  the  computing  complexity  is 


SC  Cs  *  Ct  ) 


Assuming  that  no  more  than  one  block  of  the  source  file 

and  one  block  of  the  target  file  are  in  the  primary  storage 

2 

at  one  time,  2*(B*n*m)  accesses  to  the  secondary  memory  are 
required.  In  terms  of  the  cardinalities  of  the  source  and 
target  files,  this  is  the  access  complexity  of 


0(  (Cs*Ct)/r  )  . 


B.  THE  BACKENDS  PERFORM  THE  JOIN  FUNCTION 

In  this  alternative,  we  will  consider  three  different 
strategies.  In  the  first,  the  backends  share  the  join 
operation  equally.  In  the  second,  the  join  function  is 
performed  step-by-step  at  the  backends.  In  the  third,  a 
single  backend  performs  the  join  function  with  the  complete 
source  and  target  files.  Let  us  examine  the  details  of  these 


strategies. 


he  Backends  Snare  t .n 


j  a  1 1  v 


In  this  strategy,  the  backends  send  eitner  source  or 
target  records  to  each  other.  Let  us  assume  that  the  target 
records  are  transmitted  between  the  backends.  After 
transmission  of  the  records,  each  backend  contains  Ct  target 
records.  Mext,  each  backend  performs  the  join  function  over 
its  own  oart  of  source  records  and  all  of  the  target 


records .  Then,  the  result  records  from  the  backer.os  a^e 
transmitted  to  the  controller. 

Since  each  backend  contains  m  target  file  blocks, 
(3*m)  target  file  blocks  are  transmitted.  Therefore, 
complexity  of  transmitting  the  target  file  blocks  among  the 
backends  is 


0(  3»m  ) ,  or 
0(  Ct/r  ). 


Each  backend  first  stores  (B*m)  target  file  blocks 
which  requires  access  complexity  of 


0(  Ct/r  ). 


Each  backend  now  contains  n  source  file  blocks  and 
(3*m)  target  file  blocks.  Therefore,  the  effective  computing 
complexity  for  performing  the  join  is 


0(  3*n*m*r  ) ,  or 
0(  Cs*Ct/B  ). 


2* 3 * .n # n  accesses  to  the  secondary  storage  are  reqjired,  so 
the  access  complexity  is 

0(  Cs*Ct/(9*r)  ). 

Finally,  each  backend  transmits  the  result  records 
to  the  controller.  Let  us  assume  that  each  backend  yields 
the  same  number  of  result  records,  expressed  3s  a  percentage 
q  of  the  cross-production  of  the  records  participating  in 
the  join.  Then,  the  number  of  the  records  to  be  transmitted 
from  each  backend  to  the  controller  will  be  (q*3*n*n*r)  or* 
(q*(Cs*Ct)/5) .  The  communication  complexity  for  transmission 
the  result  records  from  9  backends  to  the  controller,  then, 
is 

0(  q*(Cs*Ct)/r  )  . 

2  .  The  Backends  Perform  the  Join  Step-by-Step 

In  this  strategy,  the  join  operation  is  performed 
step-by-step  at  the  backends.  At  each  step,  the  number  of 
backends  involved  in  the  join  is  reduced  by  one-half.  A 
backend  performing  the  joi.n  function  sends  its  source  and 
target  records  to  its  neighb  backend.  Figure  7.1  depicts 
the  the  flow  of  records.  The  total  number  of  steps  required 
is  log  B,  where  3  is  the  number  of  backends. 

The  arrows  indicate  the  transmission  direction  of 
blocks.  At  each  step,  the  backends  involved  first  perform 
the  join  on  the  portions  of  the  source  and  target  files 
available,  and  send  the  partial  result  to  the  controller. 


Next,  the  subsets  of  the  source  and  target  files  are  sent  to 
the  neighbor  backend. 

At  each  step,  the  number  of  blocks  to  be  transmitted 
over  the  broadcast  bus  is  half  of  the  total  number  of  source 
file  blocks  plus  half  of  the  total  number  of  target  file 
blocks.  Thus,  the  communication  complexity  for  log  3  steps 
i  s 

0(  ( (Cs+Ct)/r)»  log  3!  )  . 

At  each  step,  the  backends  receiving  the  source  an  i 
target  records  from  their  neighbors  first  store  them  before 
the  join  starts.  The  effective  access  complexity  of  storing 
the  records  at  each  step  is  derived  as  follows. 

1.  step  (1/2)*(Cs+Ct)/(3»r) 

2.  step  1  * ( Cs+Ct ) / ( B*r ) 

3.  step  2  *<Cs+Ct)/(8*r) 

4.  step  4  *(Cs+Ct)/(B*r) 

•  • 

•  • 

•  • 

r  ]  i 

llog  B|  step  2  ^  *(  Cs+Ct)  /  (  B*r ) 

Therefore,  the  total  effective  access  complexity  for  storing 
the  records  is 

0(  2~^&*  (Cs+Ct)/(B*r)  ). 


Trie  c  o  m  p  u  t  i 

ng  complexity  for  the 

join 

is  d e r i v 

following. 

1 .  step 

(  ,n*r  )*(  m*r  ) 

= 

n  *  *  r 

2.  step 

(2 *n*r ) *(2*m*r) 

= 

4  *  *  ~j  #  r' 

3.  step 

(4*n*r)*(4»n*p) 

= 

1  5  r 

log  3 ■  step 

•  • 

•  • 

(  2  "  *n*r)*(2'~  *m*r) 

= 

0  *  o  * 

Therefore,  the  tot a 

1  effective  c o m p u t i n g 

comp  1 

exit y  is 

0( 

2  "  *Cs*Ct/3~  )  . 

Since  the  number  of 

source  and  target  bl 

ocks 

p  a  r  t  i  c  i 

in  the  join  changes  at  each  step,  th 

e  acc 

ess  comp 

during  the  join  is 

derived  as  following. 

1 .  step  2* (  n*m  ) 

2.  step  2*(2n*4m) 

3.  step  2*(4n*4m) 


4.  step 


2* ( 3n*8m) 


hen  , 


The  total  effective  access  complexity  is,  t 

*  .  Sfc  .  . 

Q(  2  ''  «(Cs*Ct)/(B'*r)  ). 

Only  the  result  records  are  transmitted 
controller.  Since  we  use  q*Cs*Ct  to  represent  the  num 
result  records,  communication  complexity  is 

0(  q«(Cs*Ct)/r  ). 

3 .  One  3ackend  Per  forms  the  Join  ^ unction 

In  this  strategy,  the  source  and  target  reco-* 
each  backend  are  transmitted  to  a  designated  backend, 
this  performs  the  join.  Since  each  backend  conta 
source  file  blocks  and  m  target  file  blocks 
communication  complexity  is 

0(  S* ( n+m )  ) ,  or 

0(  ( Cs+Ct )/r  ). 

The  records  sent  from  the  other  backends  are 
stored  into  the  ■  secondary  storage  of  the  designated  be 
This  is  the  access  complexity  of 


0(  ( Cs+Ct )/r  ). 


Tr.e  designated  backend  now  contains  C  s  source 
records  and  Ct  target  records  .  '.'sing  a  str  ai  g'r.t  f  or  war  d  join 
algorithm,  the  computing  complexity  is 

0(  C s * C t  )  , 

and  2*3*n*m  accesses  to  the  secondary  storage  are  recui-ed, 
for  access  complexity  of 

0(  Cs*Ct/r  )  . 

The  designated  backend  produces 
records.  Transmission  of  these  result 
controller  has  complexity  of 

0(  q*(Cs*Ct)/r  ). 

C.  THE  CONTROLLER  AND  THE  3AC-CE.NDS  SHARE  THE  JOIN  FUNCTION 
In  this  alternative,  the  controller  and  the  backends 
share  the  join  function  ,  and  the  controller  integrates  the 
results.  Each  backend  transmits  the  its  part  of  both  the 
source  records  and  the  target  records  to  the  controller.  At 
the  same  time,  each  backend  performs  a  partial  join  with  its 
source  and  target  records.  In  the  meantime,  the  controller 
performs  the  join  function  with  the  sets  sent  from  the 
backends,  except  for  those  sets  which  are  joined  at  the 
backends . 


q*Cs*:t 

records 
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Let  n  1  ,  n  2  ,  .  .  .  ,  n  3  be  the  subsets  of  the  source  file  arc 
ml,  m2,..., m3  be  toe  subsets  of  toe  target  file  such  to  at 
backend  i,  Bi ,  c;  .tains  the  subsets  ni  and  mi.  The 
transmission  of  the  whole  source  and  target  file  into  t ns 
controller  has  the  communication  complexity  of 

0 (  3* ( n+m)  ) ,  or 
0(  (Cs+:t)/r  ). 

The  controller  first  stores  the  records.  This  requires 
the  access  complexity  of  0 (  ( C s  +  C t ) / r  ;  . 

The  partial  join  function  at  the  b  a  c  k  e  n  d  ‘has  t  e 
computing  complexity  of  D(  Cs*Ct/3  ),  and  access 
complexity  of  3(  Cs*Ct/(3*r)  ). 

The  controller  now  contains  ni  source  set  and  mi 
target  set.  Since  the  backends  perform  only  part  of  tr.e 
join  ,  the  rest  of  the  join  function  is  performed  at  the 
controller.  This  means  each  ni  is  compared  with  mj  to  output 
the  result  records  such  that  K  =  i  <  =  B  and  K=  j  <=  B,  and 
i~  =  j.  This  requires  B  * ( 3— 1 )  times  (n*m)  comparisons. 
Therefore,  the  join  function  at  the  controller  has  computing 
complexity  of 

0(  n  *m*  3* ( B- 1 )  )  ,  or 

0(  Cs*C  t  )  , 

l  2. 

and  access  complexity  of  0(  3*n*m  ),  or  0(  Cs*Ct/r  ). 
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D.  EVALUATING  THE  ALTERNATIVE  DISTRIBUTIONS  Or  THE 
FUNCTIONALITY 

In  the  previous  sections  we  have  presentee  five 
alternative  distributions  of  the  functionality  of  join 
between  the  controller  and  the  backends.  In  this  section,  we 
will  analyze  the  tradeoffs  of  the  alternatives.  "Taole  3 
summarizes  the  results  of  the  analyses  in  terms  of  tnat  the 
computing,  access,  and  communication  complexities. 

Alternative  A  represents  the  distribution  of  f uncti on 
presented  in  Section  A  of  this  chapter.  The  : o n t - :  1 1  e- 
performs  the  join  function.  Alternative  5.1  represents  tne 
distribution  presented  in  Section  3.1  of  this  chapter.  The 
backends  share  the  join  function  equally.  Alternative  3.2 
represents  the  distribution  presented  in  Section  3.2  of  this 
chapter.  The  backends  perform  the  join  function  step-by- 
step.  Alternative  3.3  represents  the  distribution  3.3 
presented  in  Section  3.3  of  this  chapter.  Finally, 
alternative  C  represent  the  distribution  C  presented  in 
Section  C  of  this  chapter.  The  controller  and  the  backends 
share  the  join  function.  Let  us  examine  each  of  these 
alternatives  with  regard  to  the  design  goals  of  MDBS. 

Alternative  A  is  clearly  contrary  to  design  goal  of 
minimizing  controller  function.  Therefore,  we  will  eliminate 
it  from  further  consideration.  Alternative  3.1  meets  the 
goal  of  minimizing  controller  function  and  distributing  the 
work  over  the  backends.  The  communication  complexity  is  also 
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less 


than  that  of  either  of  the  other  alternatives,  3.2  and 


3.3  • 

*  Alternative  3.2  meets  the  design  goal  of  minimizing 

t 

controller  function.  However,  the  computing  and  access 
complexities  increase  exponentially  with  the  factor  of  2*log 
j  3.  This  is  an  especially  important  consideration  for  Z'2 

overhead  in  the  system.  In  addition,  the  same  blocks  will  oe 
broadcasted  log  3  times  over  broadcast  bus,  increasing  high 
|  communication  overhead.  As  we  recall,  a  similar  proced  j-e 

was  proposed  in  Chapter  IV  f-  the  sort  function.  However, 
the  characteristic  of  the  join  function  does  not  take 
advantage  of  this  procedure.  At  each  step,  the  output  of  the 
backends  is  wasted,  since  each  record  in  the  source  file 
must  be  compared  with  every  record  in  the  target  file  to 
form  the  join.  The  same  records  will  be  transmitted  between 
the  backends  redundantly.  Therefore,  we  will  eliminate  this 
alternative  from  further  consideration.  Alternative  3.3 
does  not  meet  the  design  goal  of  sharing  the  work  between 
the  backends.  Furthermore,  transmission  of  source  and  target 
file  blocks  into  the  designated  backend  increases  the 
communication  overhead.  This  alternative  is  also  eliminated 
from  further  consideration. 

Alternative  C  increases  the  amount  of  work  which  is  to 
be  done  by  the  controller.  This  is  also  contrary  to  design 
goal  of  minimizing  controller  function.  Therefore,  we  will 
eliminate  this  alternative  from  further  consideration. 
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In  the  previous  chapter,  we  analyzed  the  distribution 
the  functions  of  the  join  operation  in  MD3S  assuming  tr.at  a 
straightforward  join  algorithm  is  used.  In  the  first  p a r t 
of  the  thesis,  we  discussed  how  the  sort  function  can  o ? 
implemented  in  MDBS.  Assuming  that  one  sort  function  is 
implemented  as  recommended  in  chapter  IV,  we  now  discuss  *. ;  »• 
the  join  operation  can  be  implemented  using  a  sort-mat :  : 
algorithm. 

A.  ALTERNATIVE  DISTRIBUTIONS  OF  THE  JOIN  FUNCTION  BY  USING 

A  SORT-MATCH  ALGORITHM 

When  using  a  sort-match  algorithm,  the  source  scores 
and  the  target  records  are  first  sorted.  Then,  the  join 
function  is  performed.  The  join  can  be  formed  by  a  simple 
matching  of  the  source  attribute  values  and  the  target 
attribute  values. 

In  Chapter  IV,  we  examined  how  to  perform  the  sort 
function  at  MDBS.  As  we  recall,  our  proposal  was  to  apply 
tne  alternative  C.2  in  Chapter  IV,  the  backends  sort  and 
perform  a  partial  merge,  and  the  controller  performs  the 
final  merge.  With  such  a  capability,  we  propose  two 
alternatives  for  distributing  the  functions  of  the  sort- 
match  join  algorithm  among  the  controller  and  the  backends. 


The  first  alternative  is  as  follows.  Each  backend  perform s 
sort  and  partial  merge  of  the  source  and  target  records. 
Then,  each  backend  broadcasts  its  target  records  to  all 
other  backends.  Each  backend  then  joins  its  portion  of  tr.e 
source  records  with  all  of  the  target  records,  transmitting 
the  results  to  the  controller. 

The  second  alternative  is  the  following.  The  backends 
perform  sort  and  partial  merge  on  the  source  and  target 
records,  which  are  then  transmitted  to  the  controller.  The 
controller  performs  the  final  merge  of  the  source  records 
and  of  the  target  records,  and  then  performs  the  join  of  all 
of  the  source  records  and  all  of  the  target  records,  let  us 
examine  each  of  these  alternatives  in  detail. 

1 .  The  Backends  Share  the  Join 

In  this  case,  both  source  and  target  files  are  first 
sorted  at  the  backends  separately.  Using  a  comparison-based 
sorting  algorithm,  the  effective  computing  complexity  of  the 
internal  sort  phases  of  both  Cs/3  source  and  Ct/B  target 
records  is 

0(  ( <Cs+Ct)/B)  *  log  r  )  . 

2*(.n+m)  accesses  to  the  secondary  storage  are  required.  So, 
the  effective  access  complexity  during  the  internal  sort 
phases  of  both  source  and  target  files  is 

0(  ( Cs+Ct ) / ( B*r )  ). 
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Assuming  that  the  merge  phase  is  also  performed  to 
complete  the  sorting  of  both  files,  th-e  effective  cemputir. 
complexity  of  the  merge  phase  3t  the  backends  is 

0 (  n*r*:log  n:  +•  m*r*log  m  )  ,  or 

0(  (Cs/B)*log(Cs/(8*r)  )]  +  ( Ct/ 3)  *log  (  Ct/ (  3*r  )  )1  ) 

2  * ( n* log  n  +  m* log  m  )  accesses  to  the  secondary  storage  ar* e 
required  to  complete  the  merge  function.  Therefore,  tne 
effective  access  complexity  for  the  merge  at  tne  backends  is 

0(  n*log  n  +  n*iog  ml  )  ,  or 

0(  ( C s /  (  3 * r ) )  *1  o g  ( C s /  (  3 * r  ) )]  +  (  Ct/ (  9*r  )  )  *iog  {  Ct  /  {  3«r  )  ) 

Next,  the  target  records  are  transmitted  between  the 
backends  .  This  is  the  communication  complexity  of 

Q(  9*m  ) ,  or 

0(  Ct /r  ). 

The  target  records  transmitted  from  the  other 
backends  -are  first  stored  before  the  ^oin  starts.  This  is 
the  access  complexity  of 

0 (  Ct /r  ). 

Each  backend  now  contains  n  blocks  of  the  source  and 
3*m  blocks  of  the  tar  file.  That  is,  each  backend  has  one 
run  of  source  file  and  d  runs  of  target  file  blocks  with  the 
length  of  n  and  m,  respectively.  B*m  target  blocks,  then, 


r  v  v  ~  +  tv  tv  ^  «r. 
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must  be  merged  by  each  backend.  Assuming  that  a  -  -  *  3  y  nerge 
ts  used,  the  computing  complexity  of  merging  3*m  blocks  at  a 
backend  is 

0 (  3*m*r*log  3! ) ,  or 
0(  Ct*flog  al  )  . 


k 

v: 

k 

: 


2*3*m*log  3  accesses  to  the  secondary  storage  are  required. 
So,  the  access  complexity  required  during  one  merge  of 
target  records  is 

3  (  (Ct/r)  *  -log  sl  )  . 

Finally,  each  backend  performs  the  join  over  Cs/3 
source  and  Ct  target  records.  The  effective  computing 
complexity  of  the  join  is 

0 (  min  (  n*r ,  B*m*r  )  ) ,  or 
0(  min  (  Cs/B  ,  Ct  )  )  , 

and  2* ( max  (  n,  3*m  )  )  accesses  to  the  secondary  storage 
are  required.  This  is  the  access  complexity  of 

0(  max  (  Cs/(B*r) ,  Ct/r  )  ) . 
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Each  backend  .now  has  a  port!  on  of  the  result 
records.  Using  the  same  notations  as  in  the  previo  js 
chapter,  there  are  q*(nin  (  Cs/B  ,  Ct  ))  result  records  at 
each  backend.  The  communication  complexity  of  transmitting 
the  result  records  from  each  backend  to  the  controller  is 

0(  (3/r)*q*mi.n  (  Cs/B  ,  Ct  )  ). 

2 .  The  Controller  Performs  the  Join 

Here,  each  backend  performs  the  internal  sort  prase 
and  the  partial  merge  phase  of  the  its  portion  of  the  $ : u -  : e 
and  the  target  records,  and  then  transmits  these  reoo.'is  to 
the  controller.  The  controller  first  merges  the  source  and 
target  records  separately,  and  then  performs  the  join  on  the 
source  and  the  target  records. 

The  effective  computing  complexity  to  sort  n  source 
file  blocks  and  m  target  relation  blocks  at  the  backend  is 

0(  n*r*log  r  +  m*r*log  r  ) ,  or 

0(  (  ( Cs+C t ) / B ) *  log  r  )  )  . 

2*(n+m)  accesses  to  the  secondary  storage  are  required.  3o , 
the  effective  access  complexity  is 

0(  ( Cs+Ct ) / ( B*r )  ). 

Assuming  that  a  2-way  merge  is  implemented  to 
complete  the  sort  of  n  source  and  m  target  file  blocks.  So 
the  computing  complexity  of  the  merge  is 


3 (  n*r *log  n:  +m*r*'log  ml),  or 

3 (  (  Cs/ 3 )  *log  (  Cs/ (  3*r  )  )1  +  (  Ct/3)  »log<  Ct/ (  3*r  )  ).  )  . 

2  *  ( n  *  1  o  g  n  m  *  1  o  g  m)  accesses  to  the  secondary  storage  ar; 

required.  This  is  the  access  complexity  of 

0(  n*'log  nl  +  m*llog  m]  )  ,  or 

3  (  (Cs/(3*r)»log(Cs/(3*r)  )]  +  (  Ct/ (  3*r  )  *log  (  Ct/ (  3*r  )  1  ). 

Next,  the  sorted  records  are  transmitted  to  the 
controller.  So,  the  communication  complexity  is 

3 (  3*(n+m)  ),  or 
0(  (Cs+Ct)/r  ). 

The  records  are  first  stored  at  the  controller  before  the 
join  starts.  This  is  the  access  complexity  of 

0(  (Cs  +  Ct)/r  ) . 

The  controller  now  contains  B*n  blocks  of  source 
file  and  B*m  blocks  of  target  file.  That  is,  3  runs  of 
source  file  and  B  runs  of  target  file  with  the  length  of  n 
and  m,  respectively.  The  computing  complexity  of  merging 
source  and  target  records  saperately  is 

0(  B*n*r*llog  b!  +  3*m*r*llog  b!  )  ,  or 
0(  ( Cs+  Ct)*flog  b!  )  . 

2*  B*  ( n'+m )  *  log  B  accesses  to  the  secondary  storage  are 


required.  This  is  an  access  complexity  of  the  merge  at  the 
controller  which  is  3(  ( ( Cs+Ct)/r) *log  3;  ). 

Finally,  the  controller  performs  the  join  or.  sorted 
source  and  target  files.  The  computing  complexity  for  the 
join  is  Q(  min  (  3 *n*r,  3 *m*r  )  ),  or  0(  min  (  Ts,  Ct  )  ), 
and  2*(  max  (  3*n,  3*m  )  )  accesses  to  the  secondary _ st or  age 
are  required.  This  is  an  access  complexity  of  the  join  at 
the  controller  which  is  C(  max  (  Cs/r,  Ct/r)  ). 

3.  33'*?  ARISONS  3ETVEEN  THE  TWO  ALTERNATIVES 

Table  4  illustrates  the  time  complexities  for-  pot •: 
alternatives,  using  a  sort-match  algorithm.  Again,  tr.e 
computing  complexity,  the  access  complexity,  and  the 
eommun  i  cat  ion  complexity  are  given  separately.  "The 
computing  complexity  includes  the  sum  of  tne  computing 
complexities  of  the  internal  sort  phase,  the  merge  phase, 
and  the  join. 

The  access  complexity  includes  the  sum  of  the  access 
complexities  of  the  internal  sort  phase,  the  merge  phase, 
and  the  join.  Finally,  the  communication  complexity  shows 
the  time  required  to  transmit  the  source  and  the  target 
records  among  the  backends  and  between  the  controller  and 
the  backends.  The  complexity  formulas  of  accesses  to  the 
secondary  storage  are  given  only  for  the  adcitional  accesses 
necessary  to  complete  the  join.  In  other  words,  accesses  to 
the  secondary  storage  to  retrieve  the  records  to  perform 
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selection  end  projection  before  the  join  stares  are  not 
included.  Let  us  examine  the  Table  d  row  by  row  comparing 
the  two  alternatives. 

The  computing  complexity  in  the  backends  for  t"e 
alternative  A.1  is  larger  than  the  alternative  A. 2  since 
each  backend  in  alternative  A.1  contains  all  the  target  file 
records.  On  the  contrary,  the  alternative  A. 2  has  a 
computing  complexity  at  the  controller.  Therefore, 
alternative  A.1  is  better  than  alternative  A. 2  with  r  e  g  a  r  : 
to  meeting  the  design  goal  of  minimizing  cent” oiler 
function. 

The  alternative  A.1  requires  more  accesses  to  tre 
secondary  storage  for  the  backends  than  the  alternative  A  .2. 
However,  again,  the  alternative  A. 2  requires  more  accesses 
to  the  secondary  storage  at  the  controller.  Therefore, 
alternative  A.1  is  better  than  alternative  A.  2  due  to 
meeting  the  design  goal  of  minimizing  controller  function. 

Despite  the  situation  that  the  alternative  A. 2  has  lower 
transmission  overhead,  this  may  be  negligible  when  balanced 
against  I/O  requirements  at  the  controller.  Therefore,  we 
will  recommend  the  alternative  A. 2,  i.e.,the  backends 
perform  the  join,  for  implementation  of  the  join  using  a 
sort-match  algorithm  in  MDBS.  This  alternative  best  meets 
the  design  goals  of  minimizing  controller  function  and 
sharing  the  work  equally  at  the  backends. 


A.l.  THE  BACKENDS  SHARE  THE  JOIN  BY  A. 2.  HIE  CONTROLLER  PERFORMS 


Bs:  Backends,  C:  Controller 

Table  A.  Alternative  a.1.  vs  Alternative  A. 2 
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Table  5  depicts  the  time  complexities  for  the  best 
alternative  using  the  straightforward  join  algorithm  and  the 
best  alternative  using  the  sort-match  join  algorithm,  let  us 
now  compare  the  two  alternatives.  Let  us  assume  that  the 
number  of  source  records  is  equal  to  the  number  of  ta.-get 
records,  i.e.,  Cs  =  Ct.  Let  the  block  size,  - ,  be  e:ual  t: 
6d.  We  will  compare  the  access  complexities  an:  v.e 
computing  complexities  of  tne  two  alternatives  wit", 
selected  number  of  records  involved,  Cs  and  Ct,  the  -esult 
proportionality,  q,  and  varying  the  number  of  backends,  3 . 

Figure  3.1  shows  the  access  complexities  for  Cs=Ct=2  and 

14 

2  ,  qsO.1,  and  number  of  backends,  3,  from  2  to  ’5.  The 
increasing  number  of  backends  has  little  effect  on  access 
complexity  when  a  sort-match  algorithm  is  used,  however, 
when  the  straightforward  algorithm  is  used,  the  access 
complexity  decreases  sharply  as  the  number  of  backends 
increases.  Note  that  for  a  large  number  of  backends,  3>1b, 
the  reduction  becomes  negligible.  The  access  complexity 
required  for  the  sort-match  algorithm  is  always  less  than 
that  required  for  the  straightforward  algorithm,  and  is 
substantially  less  for  a  smaller  number  of  backends. 


Figure  3.2  shows 


z  n  s 
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computing  complexity 
algorithms  with  Cs=Ct=2  and  2  ,  q=C.'.  In  this  oase,  ootr. 
alternatives  have  decreasing  computing  complexity.  Again, 
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cigure  3.3  shows  the  communication  complexity  of  one 
straightforward  join  algorithm  with  Cs=Ct=  2  ,  2,  and  2.  T.ne 
quo t ion,  q,  ranges  from  0.1  to  0.5.  Figure  S.u  depicts  the 
same  complexity  for  the  sort-match  join  algorithm  with 

'3  i3 

Cs=Ct=2  -  2  ,  and  q=0. 1-0.5.  These  two  figures  illustrate 
that  increasing  Cs,  Ct,  and  q  affect  on  the  communication 
complexity  of  the  straightforward  join  algorithm  more  than 
the  sort-match  join  algorithm. 

D.  RECOMMENDED  PROPOSAL  FOR  THE  DISTRIBUTION  OF  THE  JOIN 

OPERATION 

In  the  previous  sections,  we  have  analyzed  the 
alternatives  of  the  distribution  of  the  functionality  and 
shown  the  tradeoffs  and  the  advantages  of  each  one  by  using 
two  different  join  algorithms,  namely  the  straightforward 
join  algorithm  and  the  sort-match  join  algorithm. 


Briefly,  alternative  3.1  in  Chapter  VII  using  a 
straightf orward  join  algorithm  and  tne  alternative  A.i  in 
Chapter  VIII  using  a  sort -match  join  algorithm  are  the  best 
alternatives  for  distribution  of  the  functionality.  In  both 
alternatives,  the  functional  unit  performing  the  join  in 
MDBS  is  the  backends.  Finally,  comparisons  between  these  two 
alternatives  have  shown  that  the  alternative  A.I,  join  at 
the  bac<ends  using  a  sort -match  join  algorithm,  is  bette” 
than  the  alternative  3.1,  join  at  the  backends  using  a 
straightforward  join  algorithm,  on  account  of  meeting  t-e 
design  goal  of  minimizing  the  communication  overhead  between 
the  controller  and  the  backends. 

Having  analyzed  all  the  alternatives,  the  nest 
appropriate  choice  for  implementing  the  join  in  MDBS  is  that 
each  backend  performs  a  partial  join  with  its  portion  o f 
source  records  and  all  target  records .  Then , the  results  are 
sent  to  the  controller.  The  controller  will  then  forward  the 


final  result  to  the  host  computer 
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Therefore,  it  -ray  be  desiraole  to  select  fee  larger  of  f  -  ? 
two  files  as  the  file  to  be  transmitted. 

This  thesis  provides  the  groundwork  for  further 
analysis.  We  have  presented  computing,  access,  arc 
communication  complexities  separately.  If  some  relative 
weights  can  be  assigned  to  these  complexities,  further 
analyses  to  evaluate  the  tradeoffs  may  lead  to  providing  a 
choice  among  several  alternatives,  depending  on  the 
distribution  of  the  relevant  records  3mong  the  backends,  the 
communication  cost  and  the  acess  complexity. 
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